70 research outputs found

    Lyndon Array Construction during Burrows-Wheeler Inversion

    Get PDF
    In this paper we present an algorithm to compute the Lyndon array of a string TT of length nn as a byproduct of the inversion of the Burrows-Wheeler transform of TT. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses 2n+o(n)2n+o(n) bits of space and supports constant time access. This representation can be built in linear time using O(n)O(n) words of space, or in O(nlogn/loglogn)O(n\log n/\log\log n) time using asymptotically the same space as TT

    External memory BWT and LCP computation for sequence collections with applications

    Get PDF
    We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the partial BWTs in external memory and in the process it also computes the LCP values. We show that our algorithm performs O(n maxlcp) sequential I/Os, where n is the total length of the collection and maxlcp is the maximum LCP value. The experimental results show that our algorithm outperforms the current best algorithm for collections of sequences with different lengths and when the average LCP of the collection is relatively small compared to the length of the sequences. In the second part of the paper, we show that our algorithm can be modified to output two additional arrays that, combined with the BWT and LCP arrays, provide simple, scan based, external memory algorithms for three well known problems in bioinformatics: the computation of the all pairs suffix-prefix overlaps, the computation of maximal repeats, and the construction of succinct de Bruijn graphs

    Live neighbor-joining

    Get PDF
    Background: In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. Results: We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. Conclusion: Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used

    A method to find groups of orthogous genes across multiple genomes

    Get PDF
    In this work we propose a simple method to obtain groups of homologous genes across multiple (k) organisms, called kGC. Our method takes as input all-against-all Blastp comparisons and produces groups of homologous sequences. First, homologies among groups of paralogs of all the k compared genomes are found, followed by homologies of groups among k - 1 genomes and so on, until groups belonging exclusively to only one genome, that is, groups of one genome not presenting strong similarities with any group of any other genome, are identified. We have used our method to determine homologous groups across six Actinobacterial complete genomes. To validate kGC, we first investigate the Pfam classification of the homologous groups, and after compare our results with those produced by OrthoMCL. Although kGC is much simpler than OrthoMCL it presented similar results with respect to Pfam classification

    Bioinformatics of the sugarcane EST project

    Get PDF
    The Sugarcane EST project (SUCEST) produced 291,904 expressed sequence tags (ESTs) in a consortium that involved 74 sequencing and data mining laboratories. We created a web site for this project that served as a ?meeting point? for receiving, processing, analyzing, and providing services to help explore the sequence data. In this paper we describe the information pathway that we implemented to support this project and a brief explanation of the clustering procedure, which resulted in 43,141 clusters.O projeto SUCEST (Sugarcane EST Project) produziu 291.904 ESTs de cana-de-açúcar. Nesse projeto, o Laboratório de Bioinformática criou o web site que foi o ponto de encontro dos 74 laboratórios de sequenciamento e data mining que fizeram parte do consórcio para o projeto. O Laboratório de Bioinformática (LBI) recebeu, processou, analisou e disponibilizou ferramentas para a exploração dos dados. Neste artigo os dados, serviços e programas implementados pelo LBI para o projeto são descritos, incluindo o procedimento de clustering que gerou 43.141 clusters.915Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

    Graphs from features: tree-based graph layout for feature analysis

    Get PDF
    Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set

    Anastomose esofagogástrica cervical em dois tempos: 5 anos de experiência do Hospital de Clínicas de Porto Alegre

    Get PDF
    OBJECTIVE: Cervical esophagogastric anastomosis (CEA) is a common procedure used to restore the continuity of the digestive tract following curative or palliative surgery for esophageal cancer. At the HCPA, we carry out CEA procedures in two steps: first, we carry out a lateral cervical esophagostomy and position the esophageal substitute in the neck; second, after one week, the esophageal remnant is sutured to the esophageal substitute. The choice of esophageal substitute is made according to gastric pull-up (GP) or greater curvature gastric tube (GCGT), depending on the possibility of resection of the lesion. The objective of this paper is to describe the early results (up to 30 days) of delayed cervical esophagogastric anastomosis after resection or esophageal bypass procedures due to esophageal neoplasia. MATERIAL AND METHODS: Fifty-nine patients fulfilled the criteria for inclusion in our study, out of which there were 49 male and 55 white patients; the age average was of 51.5 years. Twenty-two patients were submitted to gastric pull-up. The risk factors for postoperative complications were similar for both groups. Tumor staging was the only difference between the two groups in preoperative examination; this difference was expected according to the criteria used for choosing the procedure. RESULTS: Seven patients (31.8%) of the GP group and in 9 patients (34.3%) from the GCGT group (RR 1.3; CI 95%: 0.5-3.0, P = 0.54) presented leakage. Two patients (9.1%) from the GP group and 1 (2.7%) from the GCGT group died (RR 3.4; CI 95%: 0.3-34.9, P = 0.54). One patient (4.5%) from the GP group and 7 (18.9%) patients from the GCGT group (RR 0.2; CI 95%: 0.1-1.8, P = 0.23) presented infections. There were no differences between the groups regarding occurrence of leakage, short-term postoperative death (until 30 days after surgery), and infections. CONCLUSIONS: Our results are similar to those of other services of reference for the treatment of esophageal cancer. In this study, we did not find any differences between the GP and GCGT groups regarding short-term postoperative complications.OBJETIVO: A anastomose esofagogástrica cervical é um procedimento utilizado para restaurar a continuidade do trato digestivo após cirurgias curativas ou paliativas para o câncer esofágico. O Grupo de Cirurgia do Esôfago, Estômago e Intestino Delgado do Hospital de Clínicas de Porto Alegre realiza o procedimento em 2 tempos cirúrgicos. No primeiro tempo, realiza-se uma esofagostomia cervical lateral e posiciona-se o substituto esofágico no pescoço. O segundo tempo é realizado uma semana após, com a sutura do esôfago remanescente no substituto elevado ao pescoço. Este substituto é escolhido entre os procedimentos de levantamento gástrico (LG) e tubo gástrico de grande curvatura (TGC), conforme a possibilidade ou não de ressecção da lesão esofágica. O objetivo do presente trabalho é de descrever os resultados precoces (até 30 dias) obtidos com a realização de anastomose esôfago-gástrica cervical retardada (postergada) após procedimento cirúrgico de ressecção ou bypass esofágico por neoplasia de esôfago. MATERIAIS E MÉTODOS: Cinqüenta e nove pacientes preencheram os critérios de inclusão, sendo 49 homens, 55 brancos, com uma média de idade de 51,5 anos. Vinte e dois pacientes realizaram cirurgia de levantamento gástrico. Os fatores de risco conhecidos para complicações pós-operatórias foram similares entre os dois grupos. A única diferença entre os grupos na avaliação pré-operatória foi o estágio do tumor, o que era esperado, tendo em vista os critérios usados para a escolha do procedimento. RESULTADOS: A fístula cervical foi detectada em sete pacientes (31,8%) do grupo LG e em nove pacientes (34,3%) do grupo TGC (RR 1,3; IC 95%: 0,5-3,0, P = 0.54). Dois pacientes (9,1%) do grupo LG e um paciente (2,7%) do grupo TGC foram a óbito (RR 3,4; IC 95%: 0,3-34,9, P = 0,54). As complicações infecciosas ocorreram em um paciente (4,5%) do grupo LG e 7 pacientes (18,9%) do grupo TGC (RR 0,2; IC 95%: 0,1-1,8, P = 0,23). Não houve diferenças entre os grupos, levando em conta a ocorrência de fístula cervical no pós-operatório, mortalidade hospitalar precoce (30 dias após a cirurgia) e infecções. CONCLUSÕES: Os dados apresentados nesta série são semelhantes a outros serviços de referência para o tratamento do câncer de esôfago, e nessa série não houve diferença entre os LG e TGC em relação às complicações no pós-operatório precoce
    corecore